parametric function
Inference Compute-Optimal Video Vision Language Models
Wang, Peiqi, Peng, ShengYun, Zhang, Xuewen, Yu, Hanchao, Yang, Yibo, Huang, Lifu, Liu, Fujun, Wang, Qifan
This work investigates the optimal allocation of inference compute across three key scaling factors in video vision language models: language model size, frame count, and the number of visual tokens per frame. While prior works typically focuses on optimizing model efficiency or improving performance without considering resource constraints, we instead identify optimal model configuration under fixed inference compute budgets. We conduct large-scale training sweeps and careful parametric modeling of task performance to identify the inference compute-optimal frontier. Our experiments reveal how task performance depends on scaling factors and finetuning data size, as well as how changes in data size shift the compute-optimal frontier. These findings translate to practical tips for selecting these scaling factors.
PHODCOS: Pythagorean Hodograph-based Differentiable Coordinate System
Arrizabalaga, Jon, Vega, Fausto, ล รR, Zbynฤk, Manchester, Zachary, Ryll, Markus
This paper presents PHODCOS, an algorithm that assigns a moving coordinate system to a given curve. The parametric functions underlying the coordinate system, i.e., the path function, the moving frame and its angular velocity, are exact -- approximation free -- differentiable, and sufficiently continuous. This allows for computing a coordinate system for highly nonlinear curves, while remaining compliant with autonomous navigation algorithms that require first and second order gradient information. In addition, the coordinate system obtained by PHODCOS is fully defined by a finite number of coefficients, which may then be used to compute additional geometric properties of the curve, such as arc-length, curvature, torsion, etc. Therefore, PHODCOS presents an appealing paradigm to enhance the geometrical awareness of existing guidance and navigation on-orbit spacecraft maneuvers. The PHODCOS algorithm is presented alongside an analysis of its error and approximation order, and thus, it is guaranteed that the obtained coordinate system matches the given curve within a desired tolerance. To demonstrate the applicability of the coordinate system resulting from PHODCOS, we present numerical examples in the Near Rectilinear Halo Orbit (NRHO) for the Lunar Gateway.
Neural Term Structure of Additive Process for Option Pricing
Providing an arbitrage-free valuation formula and specifying risk-neutral dynamics are essentially two sides of the same coin in option pricing. Yet, the modeling methodology has been leaning towards the latter for decades. That is, the invention of an option pricing model typically starts with proposing a stochastic process that is a martingale for the underlying asset, so that the corresponding risk-neural measure is constructed, and henceforth the arbitrage-free option valuation can be determined either analytically or numerically. Such a methodology was established through the pioneering work of Bachelier [4] and Black and Scholes [9], and since then, almost all of the prevailing models have been invented along this paradigm. The list includes but is not limited to local volatility models by Dupire [17], Cox [14], stochastic volatility models by Heston [20], Hagan et al. [18], Bates [8], jump-diffusion models by Merton [28], Kou [24], and other models built upon Lรฉvy processes by Madan et al. [26], Barndorff-Nielsen [7]. Nonetheless, the reverse approach, which first provides an arbitrage-free valuation formula as in Carr and Madan [11], Davis and Hobson [15] and then finds the underlying martingale supporting the formula, is still possible, as noted in [21, 27]. In recent work, Carr and Torricelli [12] starts with one particular pricing formula that yields logistically distributed marginals. Although there is no underlying Lรฉvy process that produces such marginals, by allowing the increment to be nonstationary, an additive logistic process can be constructed to support that pricing formula.
Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures
Viallard, Paul, Emonet, Rรฉmi, Habrard, Amaury, Morvant, Emilie, Zantedeschi, Valentina
In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.
Learnable Subspace Clustering
Li, Jun, Liu, Hongfu, Tao, Zhiqiang, Zhao, Handong, Fu, Yun
This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.
How to use Chainer for Theano users
As we mentioned on our blog, Theano will stop development in a few weeks. Many aspects of Chainer were inspired by Theano's clean interface design, so we would like to introduce Chainer to users of Theano. We hope this article assists interested Theano users to move to Chainer easily. First, let's summarize the key similarities and differences between Theano and Chainer. In this post, we assume that the modules below have been imported.